Clustering and Visualization of Large Protein Sequence Databases by Means of an Extension on the Self-Organizing Map
نویسندگان
چکیده
New, more effective software tools are needed for the analysis and organization of the continually growing biological databases. An extension of the Self-Organizing Map (SOM) is used in this work for the clustering of all the 77,977 protein sequences of the SWISS-PROT database, release 37. In this method, unlike in some previous ones, the data sequences are not converted into histogram vectors in order to perform the clustering. Instead, a collection of true representative model sequences that approximate the contents of the database in a compact way is found automatically, based on the concept of the generalized median of symbol strings, after the user has defined any proper similarity measure for the sequences such as Smith-Waterman, BLAST, or FASTA. The FASTA method is used in this work. The benefits of the SOM and also those of its extension are fast computation, approximate representation of the large database by means of a much smaller, fixed number of model sequences, and an easy interpretation of the clustering by means of visualization. The complete sequence database is mapped onto a twodimensional graphic SOM display, and clusters of similar sequences are then found and made visible by indicating the degree of similarity of the adjacent model sequences by shades of gray.
منابع مشابه
NGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map
Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...
متن کاملGait Based Vertical Ground Reaction Force Analysis for Parkinson’s Disease Diagnosis Using Self Organizing Map
The aim of this work is to use Self Organizing Map (SOM) for clustering of locomotion kinetic characteristics in normal and Parkinson’s disease. The classification and analysis of the kinematic characteristics of human locomotion has been greatly increased by the use of artificial neural networks in recent years. The proposed methodology aims at overcoming the constraints of traditional analysi...
متن کاملGait Based Vertical Ground Reaction Force Analysis for Parkinson’s Disease Diagnosis Using Self Organizing Map
The aim of this work is to use Self Organizing Map (SOM) for clustering of locomotion kinetic characteristics in normal and Parkinson’s disease. The classification and analysis of the kinematic characteristics of human locomotion has been greatly increased by the use of artificial neural networks in recent years. The proposed methodology aims at overcoming the constraints of traditional analysi...
متن کاملGait Based Vertical Ground Reaction Force Analysis for Parkinson’s Disease Diagnosis Using Self Organizing Map
The aim of this work is to use Self Organizing Map (SOM) for clustering of locomotion kinetic characteristics in normal and Parkinson’s disease. The classification and analysis of the kinematic characteristics of human locomotion has been greatly increased by the use of artificial neural networks in recent years. The proposed methodology aims at overcoming the constraints of traditional analysi...
متن کاملGait Based Vertical Ground Reaction Force Analysis for Parkinson’s Disease Diagnosis Using Self Organizing Map
The aim of this work is to use Self Organizing Map (SOM) for clustering of locomotion kinetic characteristics in normal and Parkinson’s disease. The classification and analysis of the kinematic characteristics of human locomotion has been greatly increased by the use of artificial neural networks in recent years. The proposed methodology aims at overcoming the constraints of traditional analysi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000